Estimating the short-term effect of PM2.5 on the mortality of cardiovascular diseases based on instrumental variables

Background PM2.5 can induce and aggravate the occurrence and development of cardiovascular diseases (CVDs). The objective of our study is to estimate the causal effect of PM2.5 on mortality rates associated with CVDs using the instrumental variables (IVs) method. Methods We extracted daily meteorological, PM2.5 and CVDs death data from 2016 to 2020 in Binzhou. Subsequently, we employed the general additive model (GAM), two-stage predictor substitution (2SPS), and control function (CFN) to analyze the association between PM2.5 and daily CVDs mortality. Results The 2SPS estimated the association between PM2.5 and daily CVDs mortality as 1.14% (95% CI: 1.04%, 1.14%) for every 10 µg/m3 increase in PM2.5. Meanwhile, the CFN estimated this association to be 1.05% (95% CI: 1.02%, 1.10%). The GAM estimated it as 0.85% (95% CI: 0.77%, 1.05%). PM2.5 also exhibited a statistically significant effect on the mortality rate of patients with ischaemic heart disease, myocardial infarction, or cerebrovascular accidents (P < 0.05). However, no significant association was observed between PM2.5 and hypertension. Conclusion PM2.5 was significantly associated with daily CVDs deaths (excluding hypertension). The estimates from the IVs method were slightly higher than those from the GAM. Previous studies based on GAM may have underestimated the impact of PM2.5 on CVDs. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-024-18750-0.


Introduction
Air pollution has seriously affected people's health and has become an increasingly serious public health problem in China.With the continuous acceleration of urbanization in China, the problem of the urban atmospheric environment is becoming increasingly serious.PM 2.5 is the main component of air pollution and is also a characteristic indicator for evaluating the relationship between air pollution and disease burden.To date, PM 2.5 is still an important pollutant affecting the air quality in most regions of China.PM 2.5 can induce and aggravate the occurrence of cardiovascular diseases (CVDs).A large number of epidemiological studies have shown that outdoor air pollution poses a serious threat to human health [1][2][3][4][5].In different cities, when the concentration of air pollutants increases, the number of hospital visits and the number of deaths from CVDs increase to a certain extent.The mortality rate in cities with severe air pollution is significantly higher than that in less polluted cities.In addition, many toxicology and human exposure studies [6,7] have shown that PM 2.5 is associated with changes in blood pressure, inflammation, autonomic function, endothelial function, and thrombus formation.Among the various air pollutants in China, PM 2.5 is the most serious, and it also poses a great threat to the CVDs of residents.Therefore, accurate estimation of the causal impact of PM 2.5 on major CVDs is of great significance for further controlling air pollution emissions, formulating air quality standards, and improving residents' health.
Compared to experimental research, one of the most prominent limitations in causal inference for observational studies is the need for effective management of confounders.However, the instrumental variables (IVs) method is not susceptible to all confounders, and its theory and application in both linear and nonlinear models have been extensively studied [8,9].Schwartz [10] was the first to apply the IVs method to estimate the acute impact of air pollution.Furthermore, there is limited research on applying the control function (CFN) method to the analysis of air pollution.Additionally, few studies in China utilize IVs method to estimate the short-term effects of air pollution.Therefore, our study aims to utilize two IVs methods to estimate the robust and reliable short-term effects of PM 2.5 on the mortality of CVDs among residents in China.

Study area
Binzhou, Shandong Province, is a city with severe ambient PM 2.5 and a typical area where smog events frequently occur.Approximately 15,000 people die of CVDs every year, accounting for more than 50% of all deaths.The resident population is approximately 3.9 million, and the total area is 9,660 square km (http://tj.binzhou.gov.cn/), as shown in Supplemental Figure S1.Additionally, the flat terrain, relatively stable climate, and infrequent occurrence of extreme weather events, such as typhoons, are similar to the situation in most cities in China.It is regarded as an appropriate place to study the effects of PM 2.5 exposure on mortality from CVDs.

Exposure data
Most studies on estimating the health effects of environmental pollution often directly use the monitoring data from environmental monitoring stations as individual exposure levels without considering the spatial heterogeneity of pollutants within cities (for example, in our study, most of the monitoring stations were located in areas with large populations, as shown in Fig. 1.).This may eventually lead to bias in health impact assessments.Taking into account computational efficiency and providing a visual representation of the impact of various factors on PM 2.5 , we use the land use regression (LUR) model [11] to estimate the spatial and temporal distributions of PM 2.5 in Binzhou.We obtained PM 2.5 data from air quality monitoring points as the dependent variable.Land use, traffic, industrial emissions, meteorology, terrain, population distribution, and other factors were used as independent variables (Supplemental Table S1 and Supplemental Table S2).Then, the longitude and latitude coordinates of the deceased were obtained according to their address before death.Because studies examining the acute association of PM 2.5 with daily mortality commonly use similar 2-day means [10], we extracted PM 2.5 within the day of death (lag 0) and the day before death (lag 1).

Death data
The death data were obtained from the death registration report information system of the Binzhou Center for Disease Control and Prevention in Shandong Province includes the address of the deceased.The cause of death of the deceased was coded and classified according to the International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10), and the diseases included in this study were classified as CVDs (ICD-10 code: I00-I99) and major CVDs, including ischaemic heart disease (IHD, ICD-10 code: I20-25), cerebrovascular accident (CVA, ICD-10 code: I61, I63), myocardial infarction (MI, ICD-10 code: I21-22) and hypertension (HTN, ICD-10 code: I10-I15).Our study was reviewed by the Ethical Review Committee of the Binzhou Center for Disease Control and Prevention (Project No:202,301).Our study did not involve human experiments or the use of human tissue samples.All respondents and relevant personnel signed informed consent forms before the investigation.

Instrumental variables
When addressing unobserved confounders, the IVs model emerges as a primary tool to mitigate these challenges.The IVs was first proposed by P.G.Wright [12] to circumvent the influence of unobserved confounders.However, the IVs model must satisfy the following three basic assumptions, as shown in Fig. 2: 1. Independence: z is independent of c and u; 2. Correlation: z is related to x; 3. Exclusive: given x and c, u, z and y are independent.
It can be proven that asymptotically unbiased estimation of the causal effect can be obtained under these basic assumptions of the IVs [13].
Previous studies on the estimation of the health effects of air pollution using IVs [10,14,15] have guided our selection of IVs in this study.We have chosen wind speed (WS) and boundary layer height (BLH) as IVs.Put simply, under certain pollutant conditions, the height of the boundary layer in the vertical direction correlates directly with the effective air volume for pollutant diffusion and dilution.A higher BLH implies a larger volume of pollutants that can be diluted, facilitating the vertical dispersion of pollutants and thereby reducing their concentration [16].BLH is unlikely to be associated with daily mortality other than by affecting air pollution changes.Air pollutants emitted in local areas also exhibit characteristics of horizontal transport.The impact of local air pollution sources increases with decreasing WS and vice versa.Except for extreme events (such as typhoons), WS is unlikely to influence population mortality directly; rather, it is only air pollution that can affect population health.Changes in WS or BLH do not alter the behavior of the exposed population; for example, there is no association with other behaviors that affect short-term CVDs mortality (such as the number of cigarettes smoked, changes in daily diet, or alcohol consumption) [10].
However, PM 2.5 , WS and BLH may vary with time and temperature.Therefore, consistent with most previous literature [10,14,15], it is necessary to remove the influence of temperature and temporal trends.Specifically, first we fit the following model: In formula (1), β 0 represents the intercept, t denotes time, ns indicates the cubic natural spline, and time is the time to control the influence of long-term trends.df is the degree of freedom (obtained by cross-validation [17,18]), here, the degrees of freedom for the time spline and temperature spline are 52 and 15, respectively.tem t represents the temperature at time t, dow denotes the dummy variable for the day of the week to control the impact of short-term fluctuations, pm t is the PM 2.5 at time t, and 1 is the model residual.The 1 is independent of the temporal trends, seasons and temperature.It represents a component of PM 2.5 and comprises selected IVs and other factors.In this study 1 is used as the exposure variable, as shown in Fig. 3:

Statistical analysis Descriptive analysis
The mean, standard deviation, median, and other common descriptive statistical analyses were carried out on the meteorological data, air pollutant data, and daily deaths.The correlation between meteorological factors and PM 2.5 was analysed using Spearman rank correlation.

Land use regression model
Based on the results of PM 2.5 source apportionment and the common geographically related variables in the LUR model and considering the actual situation in Binzhou, in our study, we selected and obtained a large number of variables, such as road traffic conditions [19], land cover types [20], population density [21], impact of pollution emissions [22], topography [23], soil texture, vegetation indices [24] and large-scale water data.With each monitoring station serving as the central point.Buffered zones ranging from 0.05 to 10.00 km are generated around these monitoring sites.Specifically, for the range of 0.05 to 1.00 km, buffer layers are established at intervals of 0.05 km.In the 1.00 to 2.00 km range, buffer layers are set at intervals of 0.5 km, and for the 2.00 to 10.00 km range, buffer layers are established at intervals of 1.00 km.The area of each type of land use and coverage type, river length, water body area, traffic road length, amount of pollution discharge and elevation, temperature, humidity, WS, BLH, boundary layer dissipation, air pressure, precipitation, vegetation index, landform, terrain relief, distance to the nearest traffic road, distance to the nearest Fig. 3 DAG with our study (deaths is our outcome, 1 is the exposure, WS is wind speed, BLH is boundary layer height, c represents the known confounders (Such as temperature, etc.), u represents the unobserved confounders) Fig. 2 DAG with IVs (y is the outcome, x is the exposure, c represents the known confounders, u represents the unobserved confounders, and z is the IVs) traffic intersection, distance to the nearest water body, soil composition, population density, night light data, and distance to the monitoring site were counted.Taking the PM 2.5 at the site as the dependent variable, the above geographical variables were selected as the predictor variables and estimated by the random forest regression model.Finally, based on the previously constructed random forest regression model [25].The 10-fold crossvalidation coefficient of determination R 2 between PM 2.5 daily estimates and ground-based observations is 0.87, with a RMSE of 17.10 µg/m 3 (Supplemental Figure S2).Then, points (100 m×100 m) were uniformly distributed in the administrative area.The values of relevant variables at each grid point were collected and subsequently input into random forest model to calculate the estimated PM 2.5 at each grid point.Kriging interpolation [24,26] was used to obtain the PM 2.5 distribution on the surface.

Two-stage predictor substitution
The two-stage predictor substitution (2SPS) is a nonlinear extension of the two-stage least squares method and is also completed in two stages.In the first stage, the predicted value of exposure is obtained through nonlinear regression of IVs and exposure, and then the predicted value is substituted for the exposure in the second stage.Specifically, first we fit the following model: In formula (2), f is a nonlinear function, BLH t and W S t are IVs representing the BLH and WS, respectively, and 2 is the model residual.What's more, BLH and WS may capture some variation of air pollution that is missed by the others, so constructing an IV by combining the two can improve power and avoid the problems of weak IVs [10,15,27].Therefore, we combine the information on BLH and WS on the day of death (lag 0) and the day before death (lag 1) to generate a single pollutioncalibrated IV.In the first stage of the 2SPS method in our study, we employ support vector regression (SVR) [28] with a radial kernel to estimate the variations in 1 explained by BLH and WS.The ) is obtained through modelling the changes in 1 using SVR.It's important to note that 1 is independent of confounders.
However, the mean number of daily deaths varies over time and can lead to severe overdistribution if left untreated; thus, the time cubic natural spline is included in the model.The mortality rate generally obeys the Poisson distribution, so the function is log (•), and the second-stage regression is shown in formula (3): In formula (3), y t is the number of deaths at time t, β 3 is the intercept, β 4 is the coefficient of exposure variables estimated by the model, the degrees of freedom for the time spline are 32, 1 is the predicted value of 1 in formula (2) and 3 is the random error.

Control function
The CFN [9] is another method that uses IVs to solve unobserved confounders.Unlike the conventional IVs method, the CFN method addresses unobserved confounders by incorporating surrogate variables for confounders.The specific procedure of the CFN also comprises two stages.
Similarly, in the first stage, as shown in formula (4), f represents a nonlinear function, BLH t and W S t are IVs representing the BLH and WS, respectively, and 2 is the model residual.2 serves as a surrogate variable for confounders.This is because 1 is a component of PM 2.5 and consists of BLH, WS and other factors.These other factors may include confounders, as shown in Fig. 3.
In the second stage regression, 2 obtained from the SVR serves as a surrogate variable for confounders, by substituting 2 into formula (5), the effect of exposure can be determined [29].The analysis also accounts for the nonlinear impact of 2 .
log [E (y t )] = β 5 + β 6 • pm t +ns(ε 2 , df ) + ns (time, df ) + ε 5 In formula (5), y t represents the number of deaths at time t, β 5 is the intercept, β 6 is the coefficient of the exposure variables estimated by the model, pm t is the PM 2.5 at time t, the degrees of freedom for the time spline are 32, 2 is the residual value of the first-stage regression, and the degrees of freedom for the spline of 2 are 17, and 5 is the random error item of the model.

Generalized additive model
To compare with the 2SPS and CFN methods, we employed the general additive model (GAM) to estimate the impact of PM In formula (6), y t represents the number of deaths at time t, β 0 is the intercept, β 7 is the coefficient of PM 2.5 , pm t is the PM 2.5 at time t, tem t is the temperature at time t, dow is the dummy variable for the day of the week, time represents time, and 6 denotes the random error.

Time series bootstrap
In addition, when estimating parameter confidence intervals, most previous studies [10,15,30] have ignored the autocorrelation of time series data.In our study, the time series bootstrap (tsboot) method [31] was used to estimate the confidence intervals of the parameters.This is because when using the bootstrap method to estimate confidence intervals in a time series study, it's crucial to address the issue of autocorrelation within the time series data.The tsboot function retains blocks drawn during the sampling process rather than individual samples, preserving correlation information between sequences.Despite the correlation inherent in the time series data, autocorrelation coefficients may be negligible after a certain delay.Therefore, the data is divided into several intervals of fixed length, maintaining the order of sequences while considering the intervals to be approximately independent.Subsequently, bootstrap resampling is conducted on these intervals to estimate parameter confidence intervals.Therefore, our study uses tsboot to find the confidence intervals of parameters.

Negative control methods
Furthermore, our study employs negative control methods (NCMs) [32].By employing the IVs to derive postoutcome variables, we obtain 1 according to formula (2). 1 was used as a negative exposure to examine the presence of unobserved confounders.Through this method, we sought to validate whether unobserved or uncontrolled confounders in our model.
In formula (7), β 5 is the negative exposure coefficient estimated by the model, the degrees of freedom for the time spline are 32, 1 is the predicted value obtained through the IVs after death, and 4 is the random error.
The effect estimates are presented as excess risks (ERs) and percentage changes with 95% confidence interval (95% CI) for daily mortality associated with a 10 µg/m 3 increase in PM 2.5 .All the statistical tests were two-tailed, and results with P < 0.05 were considered to indicate statistical significance.In the effect estimation model, we employ the Poisson distribution as the distribution family and the logarithmic function as the link function.As a result, we exponentially transform the coefficients and confidence intervals in the model.The main analysis code is shown in Supplemental Code S1.

Descriptive statistics of meteorological factors and air pollutants
The average daily temperature was 289.80 ± 1.12 K, and the average WS was 2.47 m/s.The average concentration of PM 2.5 among those who died of CVDs was 111.63 µg/ m 3 , which was much higher than the 75 µg/m 3

Correlation analysis of meteorological factors, PM 2.5 concentration and temperature
The correlation analysis between BLH, WS and PM 2.5 showed that meteorological factors were negatively correlated with PM 2.5 (r=-0.17,r=-0.19), and there was a strong correlation between PM 2.5 and meteorological elements, reflecting that BLH and WS are important IVs for studying the impact of air pollutants on human health.Temperature was strongly correlated with BLH (r = 0.24) and was also correlated with daily deaths from CVDs(r=-0.57).See Table 3 for details.

The short-term effect of PM 2.5 on CVDs mortality
In our IVs method, temperature, short-term fluctuations, and long-term temporal trends explained 62.40% of the average PM 2.5 variation, and these effects were removed before fitting the predicted values 1 .IVs explained an average of 29.34% of the remaining variation in PM 2.5 .
The predicted value of 1 has a correlation of -0.03 with temperature, and 1 does not show a temporal trend.The negative exposure control method showed that negative exposure was not associated with CVDs mortality or major CVDs (P > 0.05).These findings substantiate the effectiveness of the IV assumptions, demonstrating that in such a context, the established association provides causal estimates for the impact of the locally generated air pollutant PM 2.5 on daily CVDs mortality rates.
However, there was no causal relationship between PM 2.5 and HTN.See Table 4 for details.

Discussion
This is the first study in China to utilize the IVs method to estimate the impact of PM 2.5 on CVDs.Employing 2SPS, CFN, GAM, and NCMs, we discovered a significant causal effect of the air pollutant PM 2.5 on daily mortality related to CVDs, IHD, MI, and CVA.However, no significant association was observed between PM 2.5 and HTN.
A substantial body of toxicological and human exposure research has revealed the biological pathways connecting PM 2.5 with daily CVDs in populations.Specifically, studies conducted at relevant doses have identified a significant association between PM 2.5 exposure and daily mortality rates.Human exposure investigations have demonstrated that exposure to air on busy streets (PM 2.5 =24 µg/m 3 ) for 5 h results in a 25% reduction in vascular dilation, an increase in sympathetic nervous system activity, and a decrease in parasympathetic nervous system activity compared to exposure to filtered air (PM 2.5 =3µg/m 3 ) [33].In an intervention experiment, participants who walked on the street for two hours exhibited lower blood pressure when wearing particle-filtering masks than when not wearing masks [34].Randomized controlled trial focusing on air filtration among elderly individuals revealed improved microvascular function after 48 h of exposure to filtered air [35].A recent randomized trial involving university students revealed associations between fine particles and HTN, insulin resistance, blood lipids, fasting blood glucose, cortisol, adrenaline, and noradrenaline [36].
In our study, negative exposure demonstrated no discernible impact on mortality rates and did not influence the estimated exposure effects.At the same time, the predicted values of IVs can explain nearly one-third of the remaining changes in PM 2.5 after controlling for time and temperature.These results demonstrate that the IV assumptions is valid and that this association provides a causal estimate of the effect of the locally produced PM 2.5 on daily CVDs mortality in such a scenario.Furthermore, our study found that the effect estimates obtained using the IVs method were higher than those obtained using the GAM.Similarly, Schwartz's study also demonstrated that the effect estimate of a 10 µg/m 3 increase in PM 2.5 on daily non-accidental mortality using the IVs method was 1.54% (95% CI: 1.12%∼1.97%),which was significantly higher than the GAM estimate of 0.98% (95% CI: 0.75 ∼ 1.22%) [15].Recently, Bae applied the IVs method to estimate the effect of O 3 on population mortality and found that for every 1 ppb increase in O 3 , there was a decrease of 0.37% (95% CI: -0.61%∼-0.14%) in non-accidental mortality.However, in previous linear models, there was no significant association between a 1 ppb increase in O 3 concentration and daily non-accidental mortality, with a regression coefficient of -0.00024 (P = 0.34) [14].The difference may be attributed to the fact that the GAM only controls for measured confounders to estimate the effect between air pollutants concentration and death from CVDs in the population.There are unobserved confounders that may affect the effect estimate, resulting in smaller results [14].Another possibility is that the particle variation captured by the IVs primarily consists of elemental and organic carbon particles from local fuel combustion, which may be more toxic than average particles [15].
At the same time, there were differences in the estimated values of the 2SPS and CFN.The estimated value of the 2SPS is higher than that of the CFN.In fact, for the linear model, the estimates of the two methods are equivalent, but for the nonlinear model, studies have shown that the two estimators are different [9].From the perspective of the model, the CFN, which incorporates surrogate variables of unobserved confounders into the regression, cannot fully control the influence of confounders because the distribution and influence mode of the unobserved confounders are completely unknown.Therefore, when applying the CFN, careful consideration should be given to the application context, acknowledging the unknown distribution and influence patterns of unobserved confounders.Within the causal framework of our study, we assert that the 2SPS yields estimates of the acute effects of the local air pollutant PM 2.5 on daily CVDs mortality that are more proximate to causal effects.In contrast, the CFN, which relies solely on time spline functions and surrogate variables for unobserved confounders, may not comprehensively account for unobservable factors (this limitation becomes particularly evident when the impact of unobserved confounders on exposure effects is substantial, especially in the presence of multiple unobserved confounding factors in the model).Therefore, unobserved confounders can lead to an underestimation of the short-term effects of air pollution exposure.Additionally, compared to traditional bootstrap methods, the confidence intervals obtained through the tsboot are slightly longer.Therefore, ignoring autocorrelation in time series data may result in an underestimation of the standard errors of effect estimates.(Supplemental Table S3).
In addition, our study has certain limitations.First, the IVs method assumes that there is no association between the IVs and the confounders, which cannot be tested, although our study proved that the 2SPS is not affected by unobserved confounders through the negative exposure method and the previous control, but the possibility of an association between IVs and unobserved confounders remains.Second, our study is based solely on data from one city.The composition and toxicity of PM 2.5 may vary among different cities, thus the conclusions drawn may not be directly applicable to other cities.Factors such as a city's specific geographical location, population density, industrial structure, and traffic conditions all influence the generation and dispersion of air pollutants, leading to variations in PM 2.5 components across cities.To gain a more comprehensive understanding of the impact of PM 2.5 on CVDs, future research will need to consider expanding the sample scope to encompass data from multiple regions.Additionally, conducting indepth analyses of regional differences will be necessary to derive more generalizable results.

Conclusions
Our study used IVs method to estimate the short-term effect of PM 2.5 exposure on the daily mortality of patients with CVDs (excluding HTN).The analysis was based on the causal framework method, and the observed associations were not subject to unobserved confounders.Additionally, compared to traditional bootstrap methods, the confidence intervals obtained through the tsboot are slightly longer.Compared to GAM, the effect estimates of PM 2.5 on mortality from CVDs in the city are higher when obtained through the 2SPS and CFN.

Fig. 1
Fig. 1 Air pollutant monitoring stations and population in Binzhou 2.5 on daily CVDs deaths in our dataset.The model included dummy variables for the day of the week, natural splines for temperature, ozone (O 3 ) and time.Similarly, the degrees of freedom for time and temperature were selected through GCV, resulting in 32 degrees of freedom for time, 15 for temperature, and 23 for O 3 .(O 3 were also obtained from a LUR model based on random forest algorithm, with a 10-fold cross-validation coefficient of determination R 2 between the O 3 daily estimate and ground-based observations is 0.90, and the RMSE is 9.09 µg/m 3 .The spatial resolution is 100 m).Specifically, we fit the following model: log [E (y t )] = β 0 + DOW + β 7 • pm t + ns (tem t , df ) +ns (time, df ) + ns (ozone t , df ) + ε 6

Table 1
Instrumental variables, temperature and PM 2.5 in Binzhou city ,2016-2020 Notes: WS is wind speed; BLH is boundary layer height; Tem is temperature; SD is standard deviation; Max is maximum value; Min is minimum value; P 25 is 25th percentile; P 75 is 75th percentile.

Table 3
Spearman correlation analysis of daily average PM 2.5 , temperature and meteorological parameters

Table 4
The analysis of GAM, 2SPS and CFN effects on PM 2.5 and deaths from main CVDs Notes: CVDs is cardiovascular diseases; IHD is ischaemic heart disease; MI is myocardial infarction; CVA is cerebrovascular accidents; HTN is hypertension.2SPS is two-stage predictor substitution method, CFN is control function method, GAM is generalized additive model